An Improved Policy Iteratioll Algorithm
نویسنده
چکیده
A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The paper's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the transformation of a finite-state controller into an improved finite-state controller. The new algorithm consistently outperforms value iteration as an approach to solving infinite-horizon problems.
منابع مشابه
An Improved Imperialist Competitive Algorithm based on a new assimilation strategy
Meta-heuristic algorithms inspired by the natural processes are part of the optimization algorithms that they have been considered in recent years, such as genetic algorithm, particle swarm optimization, ant colony optimization, Firefly algorithm. Recently, a new kind of evolutionary algorithm has been proposed that it is inspired by the human sociopolitical evolution process. This new algorith...
متن کاملAn Improved Algorithm for Network Reliability Evaluation
Binary Decision Diagram (BDD) is a data structure proved to be compact in representation and efficient in manipulation of Boolean formulas. Using Binary decision diagram in network reliability analysis has already been investigated by some researchers. In this paper we show how an exact algorithm for network reliability can be improved and implemented efficiently by using CUDD - Colorado Univer...
متن کاملOptimization of Thermal Instability Resistance of FG Flat Structures using an Improved Multi-objective Harmony Search Algorithm
This paper presents a clear monograph on the optimization of thermal instability resistance of the FG (functionally graded) flat structures. For this aim, two FG flat structures, namely an FG beam and an FG circular plate, are considered. These structures are assumed to obey the first-order shear deformation theory, three-parameters power-law distribution of the constituents, and clamped bounda...
متن کاملBat Algorithm for Optimal Service Parameters in an Impatient Customer N-Policy Vacation Queue
In this paper, a meta-heuristic method, the Bat Algorithm, based on the echolocation behavior of bats is used to determine the optimum service rate of a queue problem. A finite buffer M/M/1 queue with N policy, multiple working vacations and Bernoulli schedule vacation interruption is considered. Under the two customers' impatient situations, balking and reneging, the...
متن کاملTuning of fuzzy logic controller using an improved black hole algorithm for maximizing power capture of ocean wave energy converters
Seas and oceans are the most important sources of renewable energy in the world. The main purpose of this paper is to use an appropriate control strategy to improve the performance of point absorbers. In this scheme, considering the high uncertainty in the parameters of the power take-off system in different atmospheric conditions, a new improved black hole algorithm is introduced to tune fuzzy...
متن کامل